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ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from 
contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century 
middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella 
melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively 
low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome 
within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We con- 
firmed this placement using information from deletions and IS7ii insertions. We conclude that metagenomics stands ready to 
document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. 

IMPORTANCE Infectious diseases have shaped human populations and societies throughout history. The recovery of pathogen 
DNA sequences from human remains provides an opportunity to identify and characterize the causes of individual and epidemic 
infections. By sequencing DNA extracted from medieval human remains through shotgun metagenomics, without target-specific 
capture or amplification, we have obtained a draft genome sequence of an ~700-year-old Brucella melitensis strain. Using a vari- 
ety of bioinformatic approaches, we have shown that this historical strain is most closely related to recent strains isolated from 
Italy, confirming the continuity of this zoonotic infection, and even a specific lineage, in the Mediterranean region over the cen- 
turies. 
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Brucellosis is a widespread infection of livestock (sheep, goats, 
cattle, cows, and pigs) and remains one of the most common 
zoonotic infections, with more than 500,000 new human cases 
worldwide annually (1). Human brucellosis is most commonly 
caused by the species Brucella melitensis and is usually acquired 
through ingestion of unpasteurized dairy products or, less com- 
monly, through ingestion of infected meat or direct occupational 
contact with animals (1). If left untreated, the infection usually 
follows a chronic course, spreading systemically to the organs of 
the reticuloendothelial system and often leading to osteoarticular 
disease (2). 

Brucellosis is an ancient disease. Vertebral lesions consistent 
with brucellosis have been described in a >2-million-year-old 
male skeleton of Australopithecus africanus (3). Lesions consistent 
with brucellosis have been described in Bronze Age skeletons from 
the Levant and the Basque country, in adult skeletons from Her- 
culaneum, and in medieval human remains (2). 

A major ambition of paleopathology is to shed light on the 



influence of infectious disease on past populations. However, 
morphological analyses are limited in that few infections produce 
durable lesions and very different pathogens can produce similar 
pathologies (4). For example, similar lesions occur in tuberculosis 
and brucellosis, even though the causative organisms are quite 
different taxonomically and in cell structure. Amplification of 
pathogen DNA from human remains via PCR has provided an 
alternative source of information about a range of past infections 
(5). However, only a single study has reported success in using 
PCR amplification to confirm historical brucellosis (6). 

In addition, there are problems with amplification-based ap- 
proaches to the recovery of historical and ancient DNA. First, PCR 
is a competitive and highly sensitive reaction, prone to contami- 
nation even in dedicated facilities (7). Second, these approaches 
generally provide information on a single gene or gene fragment, 
affording little or no insight into pathogen biology, evolution, and 
epidemiology. Third, they require the onerous design and optimi- 
zation of pathogen-specific primers; this limits the open- 
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FIG 1 (A) Calcific nodules excavated from the pelvic girdle of skeleton 2658. (B) Size distribution of reads from sample 2568. (i) All reads from the initial MiSeq 
run (~2 m). (ii) Reads from the second Miseq run (>20 m) which aligned with the B. melitensis 16 M genome, (iii) Reads from the dedicated run which mapped 
to the human genome (hgl9). (C) Coverage plot of reads from sample 2568 mapped against the two chromosomes from the B. melitensis 16 M genome. The plot 
shows the average coverage and standard deviation for each 5,000-bp region in the genome. (D) Evidence of sequence damage associated with aged DNA showing 
the frequency of C-to-T and G-to-A transitions at the 5' and 3' ends of DNA fragments, respectively. 



endedness of the approach, so that one generally finds only what 
one is looking for. This last point also applies to genome capture 
approaches, which have proven successful in recovering Yersinia 
pestis genomes from samples from the Black Death and the Jus- 
tinianic plague (8, 9). 

Shotgun metagenomics — that is the unbiased sequencing en 
masse of DNA extracted from a sample without target-specific 
amplification or capture — provides an attractive alternative ap- 
proach to the detection and characterization of pathogens in con- 
temporary and historical human material. This approach has 
proven successful in obtaining genome-wide sequence data for 
Borrelia burgdorferi, Mycobacterium tuberculosis, and Mycobacte- 
rium leprae from long-dead human remains (10-12). 

When confronted with calcified nodules from a 14th-century 
skeleton, we initially thought of tuberculosis. However, when we 
used shotgun metagenomics to identify potential pathogens in the 
sample, we were instead surprised to recover a medieval Brucella 
melitensis genome sequence. 



RESULTS 

Metagenomic recovery of Brucella melitensis sequences with 
signatures of medieval origin. The skeleton of a 50- to 60-year- 
old male (skeleton 2568) was excavated from the abandoned me- 
dieval village of Geridu (Sorso, Sassari, Italy) in northwest Sar- 
dinia in December 1997 (Fig. 1) (13). The skeleton showed 
features of diffuse idiopathic skeletal hyperostosis (DISH), includ- 
ing fusions between the fourth and tenth thoracic vertebrae, fu- 
sion of the fifth lumbar vertebra to the sacrum, and extraspinal 
enthesopathies (14). Thirty-two calcified nodules were found in 
the pelvic girdle, with diameters ranging from 0.6 by 0.7 cm to 2.2 
by 1.6 cm (Fig. 1). A DNA extraction was performed on one of the 
nodules. 

We obtained a DNA yield from the nodule of 30 ng, which was 
used to construct a TruSeq Nano Illumina library, which was run 
at low coverage on an Illumina MiSeq sequencer, alongside 10 
other bar-coded libraries, 8 from other historical human tissue 
samples and 2 from blank controls. Just over two million se- 
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quences were obtained from sample 2568 on this run. An analysis 
of size distribution was performed, revealing a bimodal distribu- 
tion with a broad peak running from 50 to 150 bp, with a second 
much taller peak running up to the maximum read length of 
250 bp (Fig. 1). On the assumption that historical DNA fragments 
were restricted to the smaller peak, sequences over 150 bp in 
length were excluded from analysis of this run. Homology 
searches revealed sequences from sample 2568 that could be as- 
signed with confidence to the genus Brucella. 

We then attempted to map reads from all 1 1 samples against 
the genome of the B. melitensis reference strain 1 6 M. We obtained 
insignificant matches for 1 0 of the samples ( < 1 2 aligned reads per 
sample), whereas sample 2568 yielded >20,000 paired-end reads 
(equivalent to 10,000 sequences, as the fragments are shorter than 
the read length used in paired-end sequences) that mapped 
against the B. melitensis 16 M genome, providing approximately 
0.7-fold coverage of a medieval Brucella genome from a strain that 
we have called Geridu-1. A coverage plot (Fig. 1) revealed even 
coverage across both chromosomes in the 16 M genome, ruling 
out spurious hits to conserved sequences from environmental 
bacteria. 

To obtain additional sequencing reads, the sample 2568 library 
was sequenced at ~10-fold-higher coverage on a single dedicated 
MiSeq run, which yielded just over 20 million paired-end se- 
quences. When these were mapped at high stringency, 23% of 
these reads mapped against the human genome and 0.48% against 
the B. melitensis 16 M genome (representing 6.5-fold coverage). 
Interestingly, when the reads that mapped to either the B. meliten- 
sis or human genomes were reanalyzed, they showed a much 
tighter size distribution than the library as a whole, with a peak 
centered on 100 bases (Fig. 1). In addition, the reads mapping to 
the B. melitensis or human genome showed abundant CT and GA 
base conversions at the 5' and 3' ends, which is indicative of the 
damage typical of ancient or aged DNA (Fig. 1 ) . These findings are 
supportive of a medieval origin of these sequences (15). 

SNP-based phylogenetic placement shows that the medieval 
Brucella genome is closely related to recent Italian isolates. Con- 
ventional phylogenetic methods based on identification of trusted 
single-nucleotide polymorphisms (SNPs) cannot be applied to 
low-coverage genome sequences. However, the technique of 
"phylogenetic placement" provides an alternative solution (16). 
Here, one draws on a fixed reference tree, computed from high- 
coverage genomes, and places the unknown query sequence on 
the tree using programs such as pplacer. We used a published set 
of phylogenetically informative SNPs for Brucella spp. (17) and 
analyzed reads from the initial MiSeq run that aligned to equiva- 
lent positions in the 16 M genome. Using this approach, despite 
the low coverage, we could show confidently (with a posterior 
probability of 1) that the Geridu-1 strain clustered most closely 
with the well-characterized Ether strain (ATCC 23458; the refer- 
ence strain for B. melitensis biovar 3 ) and was nested within a clade 
of four B. melitensis strains (see Fig. SI in the supplemental mate- 
rial). Indeed, with as few as 250 reads, we were able to accurately 
assign the Geridu-1 strain to the Ether clade (Fig. S2). 

To refine the placement of the Geridu-1 genome, we con- 
structed a broad-based phylogenetic tree from all available mod- 
ern B. melitensis genomes. This showed that Geridu-1 's close rel- 
ative, the Ether strain, belonged in a distinctive clade along with 
four other strains, for which only draft genome sequences were 
available (see Fig. S3 in the supplemental material). We then com- 
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FIG 2 Phylogenetic tree showing the position of the medieval Geridu- 1 strain 
within the Ether clade. Only those SNPs which correlated with sufficient cov- 
erage in the Geridu-1 alignment were included in the construction of the tree. 



pared all five strains from the Ether clade and the Geridu-1 ge- 
nome recovered from the second MiSeq run by calling SNPs 
against the completed 16 M genome (Table SI) and using them to 
draw a phylogenetic tree for this clade (Fig. 2). The SNP table and 
tree showed that the Geridu-1 strain represented the earliest 
branching lineage within the Ether clade, separated by 450 to 500 
SNPs from any other strain in the clade (Table S2). 

At least three of the five contemporary strains that belong to 
the Ether clade originate from the Italian peninsula or from Sicily: 
F15/06-7 is a 2006 human isolate from Sicily, F5/07-239A is a 2007 
small-ruminant isolate from Italy, and the Ether strain itself was 
isolated from an Italian goat in 1961. There are no other Italian 
isolates in the set of currently available B. melitensis genome se- 
quences, perhaps suggesting that the Ether clade originated in 
Italy and its associated islands, or at least in the western Mediter- 
ranean. We note that in a recent multiple-locus variable-number 
tandem-repeat analysis (MLVA) study (18), Italian strains clus- 
tered separately from most other European strains and we suspect 
that Geridu-1 belongs to the western Mediterranean cluster de- 
fined by MLVA, although additional genome sequencing of iso- 
lates from that cluster are required to confirm this. 

Confirmation of phylogenetic placement using insertions 
and deletions. To confirm the placement of the Geridu- 1 strain 
within the Ether clade of B. melitensis, we drew upon two addi- 
tional sources of information: the distribution of deletions and the 
locations of insertion elements. First, we looked for deletions of 
>100 bases in length that occurred in the Geridu-1 genome in 
comparison to the 16 M strain. We identified 11 such deletions 
(see Table S3 in the supplemental material). We determined the 
distribution of these deletions in all available B. melitensis ge- 
nomes. Nine deletions were found in only Geridu- 1 and the five 
other strains in the Ether clade. Two occurred sporadically in 
other strains. 

Next, we examined the distribution and location of the inser- 
tion element IS7ii (also called IS650i), which occurs widely in 
Brucella spp. (19). When we mapped reads from the Geridu-1 
genome to the insertion element, we obtained 52.3-fold coverage. 
Dividing that by the average coverage for the Geridu- 1 genome 
(6.5-fold) provided us with an estimate of eight IS7i 1 copies in the 
medieval strain. By analyzing reads that spanned the ends of IS7i 1 
and the adjacent chromosome, we were able to confirm the exis- 
tence of eight insertion points in Geridu-1, seven of which also 
occurred in B. melitensis 16 M and in all other available B. meliten- 
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sis genomes (Table S3). The IS7Ji insertion present in Geridu-1 
but absent from 16 M was located at position 517784 in chromo- 
some 2, disrupting a gene encoding the hypothetical protein 
BMEII0494. This insertion point was found in no other B. meliten- 
sis strain genome, apart from the other five members of the Ether 
lineage. These patterns of deletion and IS7J J insertions confirm 
the placement of the Geridu-1 strain within the Ether clade of 
B. melitensis. 

DISCUSSION 

Here, we have shown that shotgun metagenomics can be used to 
obtain a Brucella melitensis genome sequence from a medieval 
sample without target-specific amplification or capture. The re- 
cently reported success of this approach with two mycobacterial 
diseases, leprosy and tuberculosis, has been ascribed to the unique 
properties of the mycobacterial cell wall in preserving bacterial 
DNA ( 1 1 , 12) . In contrast, this study confirms that whole-genome 
sequences from bacterial pathogens without resilient cell enve- 
lopes can be recovered from human remains by metagenomics 
hundreds or even thousands of years postmortem. However, un- 
like medieval mycobacterial DNA (12), medieval Brucella DNA 
does show signatures of damage associated with an ancient or 
historical provenance. 

This observation complements several other lines of evidence 
that support the authenticity of data obtained from this historical 
sample. First, the sample was processed in facilities dedicated to 
ancient DNA research, where no work on Brucella cultures or 
DNA had ever taken place. Second, laboratory contamination was 
ruled out by the lack of significant hits for Brucella in libraries 
obtained from eight other historical specimens and two blanks 
sequenced in the same run. Third, we observed conclusive and 
extensive matches to a dedicated human and animal pathogen, 
obtaining even genomic coverage of a genome nested within the 
B. melitensis phylogeny. This rules out spurious and patchy hits for 
conserved genes from related environmental organisms as a 
source of error. Finally, we took care to sample the interior of the 
nodule, which eliminated the risk of contamination from soil. 

Calcification of soft tissue is a recognized, albeit rare, compli- 
cation of human brucellosis, so we assume that calcification of 
abdominal or pelvic tissues accounts for the appearance of pelvic 
nodules in this individual. The skeletal pathology provides no ad- 
ditional evidence of brucellosis. Instead, the widespread bony 
changes in skeleton 2568 are diagnostic of diffuse idiopathic skel- 
etal hyperostosis (DISH), a systemic condition of unknown etiol- 
ogy that is characterized by the ossification or calcification of lig- 
aments and enthuses, particularly of the thoracic spine, and which 
is seen most commonly in men over 50 years of age ( 14) . DISH has 
been reported in numerous ancient and medieval skeletons and is 
thought to be associated with a privileged and/or sedentary life- 
style — for example, it was seen in the Medici family (20). Whether 
this historical case of brucellosis represents infection through di- 
rect contact with an infected animal (e.g., a shepherd exposed to 
birthing animals) or through ingestion of dairy products remains 
unclear, although the relatively old age at death and the coexis- 
tence of DISH as a potential disease of affluence might favor the 
latter. 

Our findings fit the epidemiological, historical, and geograph- 
ical contexts in that B. melitensis is most commonly acquired from 
sheep or goats ( 1 ) and sheep (and probably also goat) herding has 
a long history in Sardinia, as evidenced by distinctive genomic 



signatures in two local breeds of domesticated sheep and the pres- 
ence of the Sardinian moufion, which is thought to represent an 
archaic feral sheep population that reached Europe from the Neo- 
lithic domestication center (21). 

By showing that our medieval Brucella strain is most closely 
related to recent Italian isolates, we have established the continuity 
of this zoonotic infection in the region over the centuries. Brucel- 
losis still occurs on the island of Sardinia, although it is now rela- 
tively well controlled there compared to other regions of endemic- 
ity. Nonetheless, the struggle to control this infection, which has 
afflicted the peoples of the Mediterranean region since ancient 
times, is still ongoing. We conclude that metagenomic approaches 
stand ready to document past infections, shedding light on the 
emergence, evolution, and spread of Brucella and other microbial 
pathogens and to inform current and future infectious disease 
diagnosis and control. 

MATERIALS AND METHODS 

Source material. The source material consisted of 1 of 32 calcified nodules 
found in the pelvic girdle of an adult male skeleton (sample 2568) that was 
excavated from 1997 to 1999 from a cemetery in the medieval rural set- 
tlement of Geridu (Sorso, Sassari, Italy), in northwest Sardinia. Twenty- 
five single pit graves were identified in a well-organized part of the ceme- 
tery (sector 2500). Stratigraphic analysis identified two distinct burial 
phases. Phase I (9 single burials) dates to the first half of the 14th century 
CE; phase II (16 single burials) dates to the final period in which the 
cemetery was used (1350 to 1400 CE). Following the historical reports, the 
settlement of Geridu was definitely abandoned in 1426 CE. The sample 
2568 remains were retrieved from 1 of the 16 burials dating to the second 
half of the 1 4th century CE (13). Sex determination was performed on the 
basis of the morphological features of the skull. Age at death was estimated 
on the basis of dental wear and sternal rib end modification (22). Lesions 
indicative of pathologies were recorded in accordance with the methods 
and standards set out in the Global History of Health Project (23). 

DNA extraction. DNA extraction and library preparation were carried 
out in a dedicated ancient DNA laboratory in which no strains of Brucella 
had ever been cultured, no pathogen-specific PCRs had ever been per- 
formed, and in which the handler wore gloves, together with a mask, a 
gown, and a hood. The surface of the calcified nodule was removed using 
a drill bit that had been cleaned with bleach to eliminate potential surface 
contaminants. The undersurface was then removed by drilling at low 
speed to produce approximately 20 mg of powder. The sample was incu- 
bated at 37°C with shaking to demineralize it in 400 fx\ CTAB (cetyltrim- 
ethylammonium bromide) solution and 40 pi proteinase K for 1 week. 

DNA was isolated with chloroform using the DNeasy plant minikit 
(Qiagen United Kingdom), with the following modifications to the man- 
ufacturer's protocol so the kit could be used for historical human tissues. 
In brief, the demineralized sample was centrifuged at 20,000 X g for 
10 min and the supernatant was collected. To homogenize the sample, 
400 jLtl of 0.2 M chloroform was added and the tube was inverted for 
10 min. The sample was centrifuged at 6,000 X g for 3 min, and the 
supernatant was transferred to 2-ml tubes. The manufacturer's DNeasy 
plant minikit protocol was followed from step 6 with the following addi- 
tional modifications. Three volumes of buffer AW1 was added to each 
sample and incubated at room temperature for 2 to 3 h. After the addition 
of buffer AW2, the samples were centrifuged for 3 min. DNA was eluted in 
two 50-fj.laliquots (100-/id total). Extracted DNA was quantified in 5 /id of 
sample using the Qubit high-sensitivity double-stranded DNA (HS ds- 
DNA) assay according to the manufacturer's instructions (Invitrogen 
Ltd., Paisley, United Kingdom) and then stored at — 20°C until library 
preparation. 

Sequencing. DNA extracted from sample 2568 was converted into a 
TruSeq Nano library for sequencing on an Illumina MiSeq sequencer 
according to the manufacturer's low-sample protocol (Illumina UK, Little 
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Chesterford, United Kingdom), with the following minor modifications. 
No fragmentation step was included, given the expectation that ancient 
DNA would already be heavily fragmented. DNA was end-repaired within 
the ancient-DNA laboratory, with the modification of incubation for 
90 min at 30°C and size selection for <350-bp inserts. Analysis on an 
Agilent Bioanalyzer 2100 system provided an estimated size distribution 
of fragments with a peak length of 235 bp. 

Once adapters (including bar codes) had been ligated, the sequencing 
library was moved to a PCR laboratory. In accordance with the manufac- 
turer's instructions, the library fragments were PCR amplified; however, 
we used 10 PCR cycles instead of the usual 8. The library was quantified 
(2 jllI per sample) using the Qubit HS dsDNA assay according to the 
manufacturer's instructions (Invitrogen Ltd., Paisley, United Kingdom) 
and then stored at — 20°C. The sample 2568 library was diluted to 4 nM, as 
determined by analysis on an Agilent Bioanalyzer 2100 and using the 
Qubit HS dsDNA assay, and then pooled in equimolar amounts with 10 
other bar-coded libraries (8 from other historical human tissue samples 
and 2 from blank controls). The entire library pool was then diluted to 
12 pM and sequenced on the first MiSeq run using the Illumina MiSeq v2 
2 X 250-bp paired-end protocol. In a second MiSeq run, dedicated en- 
tirely to the 2568 library, the 4 nM library was diluted to 12 pM without 
pooling. 

Identification of Brucella sequences. Metagenomic sequence reads 
from both MiSeq runs using sample 2568 have been deposited in the 
European Nucleotide Archive (project accession number PRJEB6045). 
Sequences derived from sample 2568 in the first MiSeq run that had 
lengths of > 150 bp were subjected to a BLASTN search against the NCBI 
NR database, and the results were analyzed with MEGAN (24). Reads 
from all samples on the initial MiSeq run were analyzed with Bowtie2 
version 2.1.0 (25), allowing only 1 mismatch per 33 bases of the read 
(under the following settings: --rap 1,1; --ignore-quals; 
— score-min L, 0, —0 . 033). The reads were mapped against the 
genomes of the following pathogens: Plasmodium falciparum 3D7, Leish- 
mania infantum JPCM5, Yersinia pestis C092, Mycobacterium tuberculosis 
H37Rv, and Brucella melitensis 16 M (GenBank accession numbers 
AL844501 to AL844509, AE014185 to AE014188, AE001362, FR796433 to 
FR796468, NC_003143, AL123456, NC_003317, and NC_003318). 

Phylogenetic placement of Brucella sequences from Geridu at low 
coverage. Previously described lineage-defining SNPs (17) were used to 
construct a tree with FastTree 2.7.1 (26), using neighbor joining and gen- 
eralized time-reversible models of nucleotide evolution. Reads from the 
metagenome were mapped against the reference strain B. melitensis 16 M 
using the default settings in Bowtie2 and the majority base called from 
each SNP position with no quality filtering. If no base was present at the 
position, a gap was used. The pplacer suite of programs (16) was used to 
assign the sequence to a position on the tree. 

Phylogenetic analysis of B. melitensis strains. The phylogeny of all 
B. melitensis strains was analyzed, using information from the PATRIC 
database (27) and a Brucella SNP matrix from the Broad Institute (http:// 
www.broadinstitute.org). To map these strains on the matrix, sequences 
of 160 bases surrounding each SNP were taken from the reference strain, 
B. melitensis 16 M, and this fragment was mapped (using Bowtie2) against 
each strain to ascertain the corresponding base in the genome. If no map- 
ping occurred or the read mapped in two places, a gap was added instead. 
Using this extended matrix, concatenated SNPs for each strain were then 
used to construct a tree using FastTree 2.7.1. 

Calling SNPs from strains in the Ether clade. Five modern draft ge- 
nomes from strains in the Ether clade (UK31/99, F15/06-7, F5/07-239A, 
Ether, and R3/07-2) were aligned againstB. melitensis 16 M using Progres- 
sive Mauve (28), and SNPs were called using the default settings. If any 
strain had a gap or N, the SNP was discarded. SNPs for the Geridu- 1 strain 
were called from the mapped bam file using Samtools (29), with require- 
ments that there was at least 6-fold coverage and that the mutant allele 
accounted for at least 80% of aligned sequences. The SNPs from Geridu- 1 
were combined with those from the Ether clade strains, excluding any 



SNP that occurred at a location where there was less than 6-fold coverage 
in the Geridu- 1 sequence alignment. The remaining 2,332 SNPs (Ta- 
ble SI) were then used to construct a tree using FastTree 2.7.1 and to 
produce a pairwise dissimilarity matrix. 

Analysis of breakpoints in Geridu-1 and other strains. From manual 
scrutiny of the coverage plot, we identified regions in the 16 M reference 
genome where for a span of > 1 00 bp, no reads mapped from the Geridu- 1 
genome. To confirm the existence and refine the boundaries of these 
deletions, the Geridu-1 reads were remapped against 16 M using the 
--local option of Bowtie2, which allows the beginning and end of 
reads to be soft-clipped to obtain an improved alignment. The clipped 
regions of the reads at the edges of candidate deletions were used to iden- 
tify the breakpoint created by the deletion. The distribution of the break- 
points in other strains was determined by retrieving the sequences flank- 
ing the deletion breakpoints in the Geridu-1 strain and performing a 
BLASTN search of available B. melitensis genomes. Similar approaches 
were used to determine the identity and distribution of insertion break- 
points associated with ISii. 

Nucleotide sequence accession numbers. Metagenomic sequence 
reads from this study have been deposited in the European Nucleotide 
Archive (project accession number PRJEB6045). The following URL will 
pull all of the sequences that were used in our searches: http://www.ncbi. 
nlm.nih.gov/nuccore/AL844501,AL844502,AL844503,AL844504,AL844 
505,AL844506,AL844507,AL844508,AL844509,AE0 14 1 85.AE0 14 1 86.AE 
014186,AE014187,AE014188,AE001362,FR796433,FR796434,FR796435, 
FR796436,FR796437,FR796438,NC_003143,AL123456,NC_003317,NC 
_003318. 
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